Learning Complex Similarity Measures
نویسندگان
چکیده
Case-based reasoning is a knowledge processing concept that has shown success in various problem classes. One key challenge in CBR is the construction of a measure that adequately models the similarity between two cases. Typically, a similarity measure consists of a set of feature-specific distance functions coupled with an underlying feature weighting (importance) scheme. While the definition of the distance functions is often straightforward, the estimation of the weighting scheme requires a deep understanding of the domain and the underlying connections. The paper in hand addresses this problem. It shows how discrimination knowledge, which is coded within an already solved classification problem, can be transformed towards a similarity measure. Moreover, it demonstrates our approach at the problem of diagnosing heart diseases. 1 Background and Related Theory Discrimination knowledge that is coded within an already solved classification problem can be transformed towards a similarity measure of a case-based reasoning (CBR) system. This chapter first points out relationships between classification and similarity assessment in a case-based reasoning system. It then motivates and defines a generic transformation procedure from a case base to a similarity measure; the last two sections of this chapter discuss related realizational aspects. Chapter 2 presents an application, the diagnosis of heart diseases, to demonstrate the development of a similarity measure at a real-world problem. 1.1 Classification and Case-based Reasoning Let x denote a problem or some description of a situation. Then a common task is to find another problem y amongst a set S of problems, such that y is more similar to x than it is to any other z ∈ S. Using the terminology of case-based reasoning, we are given a pair 〈CB, sim〉, where CB, the case base, denotes a set of cases, and sim denotes a similarity measure, sim : CB × CB → [0, 1]. With x, y, and z ∈ CB the semantics of sim is as follows. sim(x, y) > sim(x, z) ⇔ “x is more similar to y than it is to z.”
منابع مشابه
Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures
Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...
متن کاملPresentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures
Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...
متن کاملLearning Behavior form Demonstration in Minecraft via Symbolic Similarity Measures
This paper focuses on the challenging problem of learning behavior in a complex environment purely form observation of human performance. Specifically, we explore the performance of a collection of symbolic similarity measures in modeling the behavior of a human performing tasks in the Minecraft video game using learning from demonstration. We also analyze the performance of these measures usin...
متن کاملClassification using non-standard metrics
A large variety of supervised or unsupervised learning algorithms is based on a metric or similarity measure of the patterns in input space. Often, the standard euclidean metric is not sufficient and much more efficient and powerful approximators can be constructed based on more complex similarity calculations such as kernels or learning metrics. This procedure is benefitial for data in euclide...
متن کاملTATO: Leveraging on Multiple Strategies for Semantic Textual Similarity
In this paper, we describe the TATO system which participated in the SemEval-2015 Task 2a: “Semantic Textual Similarity (STS) for English”. Our system is trained on published datasets from the previous competitions. Based on some machine learning techniques, it combines multiple similarity measures of varying complexity ranging from simple lexical and syntactic similarity measures to complex se...
متن کاملEvaluation of Different Similarity Measures for the Extraction of Multiword Units in a Reinforcement Learning Environment
In this paper, we present an application of Genetic Algorithms to extract Multiword Units (i.e. complex lexical units such as compound nouns, idiomatic expressions or phrase templates). For that purpose, a fitness function will be defined whose maximization will serve as a basis for the identification of pertinent word -grams (i.e ordered vectors of words) based on different similarity measures...
متن کامل